Briefings in Bioinformatics
◐ Oxford University Press (OUP)
All preprints, ranked by how well they match Briefings in Bioinformatics's content profile, based on 11 papers previously published here. The average preprint has a 0.04% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Ma, Z.
Show abstract
BackgroundExponential-like infection growths leading to peaks (which could be the inflection points or turning points) are usually the hallmarks of infectious disease outbreaks including coronaviruses. To predict the inflection points, i.e., inflection time (Tmax) & maximal infection number (Imax) of the novel coronavirus (COVID-19), we adopted a trial and error strategy and explored a series of approaches from simple logistic modeling (that has an asymptomatic line) to sophisticated tipping point detection techniques for detecting phase transitions but failed to obtain satisfactory results. MethodInspired by its success in diversity-time relationship (DTR), we apply the PLEC (power law with exponential cutoff) model for detecting the inflection points of COVID-19 outbreaks. The model was previously used to extend the classic species-time relationship (STR) for general DTR (Ma 2018), and it has two "secondary" parameters (computed from its 3 parameters including power law scaling parameter w, taper-off parameter d to overwhelm virtually exponential growth ultimately, and a parameter c related to initial infections): one that was originally used for estimating the potential or dark biodiversity is proposed to estimate the maximal infection number (Imax) and another is proposed to determine the corresponding inflection time point (Tmax). ResultsWe successfully estimated the inflection points [Imax, Tmax] for most provinces ({approx}85%) in China with error rates <5% in both Imax and Tmax. We also discussed the constraints and limitations of the proposed approach, including (i) sensitive to disruptive jumps, (ii) requiring sufficiently long datasets, and (iii) limited to unimodal outbreaks.
Huang, T.; Lin, K.-H.; Vieira, R. M.; Soares, J. C.; Jiang, X.; Kim, Y.
Show abstract
Early detection of potential side effects (SE) is a critical and challenging task for drug discovery and patient care. In-vitro or in-vivo approach to detect potential SEs is not scalable for many drug candidates during the preclinical stage. Recent advances in explainable machine learning may facilitate detecting potential SEs of new drugs before market release and elucidating the critical mechanism of biological actions. Here, we leverage multi-modal interactions among molecules to develop a biologically informed graph-based SE prediction model, called HHAN-DSI. HHAN-DSI predicted frequent and even uncommon SEs of the unseen drug with higher or comparable accuracy against benchmark methods. When applying HHAN-DSI to the central nervous system, the organs with the largest number of SEs, the model revealed diverse psychiatric medications previously unknown but probable SEs, together with the potential mechanisms of actions through a network of genes, biological functions, drugs, and SEs.
Li, V. O. K.; Han, Y.; Kaistha, T.; Zhang, Q.; Downey, J.; Gozes, I.; Lam, J. C. K.
Show abstract
Alzheimers Disease (AD) significantly aggravates human dignity and quality of life. While newly approved amyloid immunotherapy has been reported, effective AD drugs remain to be identified. Here, we propose a novel AI-driven drug-repurposing method, DeepDrug, to identify a lead combination of approved drugs to treat AD patients. DeepDrug advances drug-repurposing methodology in four aspects. Firstly, it incorporates expert knowledge to extend candidate targets to include long genes, immunological and aging pathways, and somatic mutation markers that are associated with AD. Secondly, it incorporates a signed directed heterogeneous biomedical graph encompassing a rich set of nodes and edges, and node/edge weighting to capture crucial pathways associated with AD. Thirdly, it encodes the weighted biomedical graph through a Graph Neural Network into a new embedding space to capture the granular relationships across different nodes. Fourthly, it systematically selects the high-order drug combinations via diminishing return-based thresholds. A five-drug lead combination, consisting of Tofacitinib, Niraparib, Baricitinib, Empagliflozin, and Doxercalciferol, has been selected from the top drug candidates based on DeepDrug scores to achieve the maximum synergistic effect. These five drugs target neuroinflammation, mitochondrial dysfunction, and glucose metabolism, which are all related to AD pathology. DeepDrug offers a novel AI-and-big-data, expert-guided mechanism for new drug combination discovery and drug-repurposing across AD and other neuro-degenerative diseases, with immediate clinical applications.
Li, L.; Ma, Z.
Show abstract
BackgroundThe relationships between tumor and its microbiome are still puzzling, with possible bidirectional interactions. Tumor microbiomes may suppress or stimulate tumor growth on the one hand; on the other hand, tumor growth may exert selection pressure on its microbiomes. There is not any consensus on the mode and/or extension of the bidirectional interactions. The objective of this study is to estimate the selection pressure from the primary tumors on tumor microbiomes by comparing with the selection pressure from the solid normal tissues on their corresponding tissue microbiomes across 20+ cancer types. MethodsWe apply Sloan near neutral theory and big datasets of tumor tissue microbiomes from the TCGA (The Cancer Genome Atlas) databases to achieve the above objective. The near neutral theory model can determine the proportions of above-neutral, neutral and below-neutral species in microbial communities, corresponding with positive, neutral and negative selection pressures from host tissues. By comparing the proportions between the primary tumors and solid normal tissues, we can infer the selection pressure of tumor growth on tissue microbiomes. ResultsWe find that approximately 65% of species in solid normal tissue microbiomes are neutral, and the proportion is only 40% in the primary tumor microbiomes. In contrast, the proportion of positively selected species exceeds 60% in the primary tumor microbiomes. Furthermore, simulations with neutral theory model reveal that most abundant species are mostly neutral, while non-neutral species are in the long tail of the species abundance distributions. ConclusionsTumor growth exerts strong positive selection on resident microbiomes, driving the abundances of certain species above the levels expected by the neutral process. Nevertheless, neutral species are still among the most abundant species, suggesting the necessity to pay close attention to the low-abundance or rare species because they are likely to play a critical role in oncogenesis.
Peng, K.; Moore, J.; Brito, J.; Kao, G.; Burkhardt, A. M.; Alachkar, H.; Mangul, S.
Show abstract
T cell receptor (TCR) studies have grown substantially with the advancement in the sequencing techniques of T cell receptor repertoire sequencing (TCR-Seq). The analysis of the TCR-Seq data requires computational skills to run the computational analysis of TCR repertoire tools. However biomedical researchers with limited computational backgrounds face numerous obstacles to properly and efficiently utilizing bioinformatics tools for analyzing TCR-Seq data. Here we report pyTCR, a computational notebook-based platform for comprehensive and scalable TCR-Seq data analysis. Computational notebooks, which combine code, calculations, and visualization, are able to provide users with a high level of flexibility and transparency for the analysis. Additionally, computational notebooks are demonstrated to be user-friendly and suitable for researchers with limited computational skills. Our platform has a rich set of functionalities including various TCR metrics, statistical analysis, and customizable visualizations. The application of pyTCR on large and diverse TCR-Seq datasets will enable the effective analysis of large-scale TCR-Seq data with flexibility, and eventually facilitate new discoveries.
Ma, Z.; Li, L.; Mei, J.
Show abstract
It is postulated that tumor tissue microbiome is one of the enabling characteristics that either promote or suppress cancer cells and tumors to acquire certain hallmarks (functional traits) of cancers, which highlights their critical importance to carcinogenesis, cancer progression and therapy responses. However, characterizing the tumor microbiomes is extremely challenging because of their low biomass and severe difficulties in controlling laboratory-borne contaminants, which is further aggravated by lack of comprehensively effective computational approaches to identify unique or enriched microbial species associated with cancers. Here we take advantages of two recent computational advances, one by Poore et al (2020, Nature) that computationally generated the microbiome datasets of 33 cancer types [of 10481 patients, including primary tumor (PT), solid normal tissue (NT), and blood samples] from whole-genome and whole-transcriptome data deposited in "The Cancer Genome Atlas" (TCGA), another termed "specificity diversity framework" (SDF) developed recently by Ma (2023). By reanalyzing Poores datasets with the SDF framework, further augmented with complex network analysis, we produced the following catalogues of microbial species (archaea, bacteria and viruses) with statistical rigor including unique species (USs) and enriched species (ESs) in PT, NT, or blood tissues. We further reconstructed species specificity network (SSN) and cancer microbiome heterogeneity network (CHN) to identify core/periphery network structures, from which we gain insights on the codependency of microbial species distribution on landscape of cancer types, which seems to suggest that the codependency appears to be universal across all cancer types.
Nam, Y.; Yun, J.-S.; Lee, S. M.; Park, J. W.; Chen, Z.; Lee, B.; Verma, A.; Ning, X.; Shen, L.; Kim, D.
Show abstract
Currently, the number of patients with COVID-19 has significantly increased. Thus, there is an urgent need for developing treatments for COVID-19. Drug repurposing, which is the process of reusing already-approved drugs for new medical conditions, can be a good way to solve this problem quickly and broadly. Many clinical trials for COVID-19 patients using treatments for other diseases have already been in place or will be performed at clinical sites in the near future. Additionally, patients with comorbidities such as diabetes mellitus, obesity, liver cirrhosis, kidney diseases, hypertension, and asthma are at higher risk for severe illness from COVID-19. Thus, the relationship of comorbidity disease with COVID-19 may help to find repurposable drugs. To reduce trial and error in finding treatments for COVID-19, we propose building a network-based drug repurposing framework to prioritize repurposable drugs. First, we utilized knowledge of COVID-19 to construct a disease-gene-drug network (DGDr-Net) representing a COVID-19-centric interactome with components for diseases, genes, and drugs. DGDr-Net consisted of 592 diseases, 26,681 human genes and 2,173 drugs, and medical information for 18 common comorbidities. The DGDr-Net recommended candidate repurposable drugs for COVID-19 through network reinforcement driven scoring algorithms. The scoring algorithms determined the priority of recommendations by utilizing graph-based semi-supervised learning. From the predicted scores, we recommended 30 drugs, including dexamethasone, resveratrol, methotrexate, indomethacin, quercetin, etc., as repurposable drugs for COVID-19, and the results were verified with drugs that have been under clinical trials. The list of drugs via a data-driven computational approach could help reduce trial-and-error in finding treatment for COVID-19.
Kapitanov, G. I.; Head, S. A.; Flowers, D.; Apgar, J. F.; Grant, J.
Show abstract
Blinatumomab is a bispecific T-cell engager (BiTE) that binds to CD3 on T cells and CD19 on B cells. It has been approved for use in B-cell acute lymphoblastic leukemia (B-ALL) with a regimen that requires continuous infusion (cIV) for four weeks per treatment cycle. It is currently in clinical trials for Non-Hodgkin lymphoma (NHL) with cIV administration. Recently, there have been studies investigating dose-response after subcutaneous (SC) dosing in B-ALL and in NHL to determine whether this more convenient method of delivery would have a similar efficacy/safety profile as continuous infusion. We constructed mechanistic PKPD models of blinatumomab activity in B-ALL and NHL patients, investigating the amount of CD3:blinatumomab:CD19 trimers the drug forms at different dosing administrations and regimens. The modeling and analysis demonstrate that the explored SC doses in B-ALL and NHL achieve similar trimer numbers as the cIV doses in those indications. We further simulated various subcutaneous dosing regimens, and identified conditions where trimer formation dynamics are similar between constant infusion and subcutaneous dosing. Based on the model results, subcutaneous dosing is a viable and convenient strategy for blinatumomab and is projected to result in similar trimer numbers as constant infusion.
Xu, Z.; Peng, Q.; Song, J.; Zhang, H.; Wei, D.; Demongeot, J.
Show abstract
DVGs (Defective Viral Genomes) and SIP (Semi-Infectious Particle) are commonly present in RNA virus infections. In this study, we analyzed high-throughput sequencing data and found that DVGs or SIPs are also widely present in SARS-CoV-2. Comparison of SARS-CoV-2 with various DNA viruses revealed that the SARS-CoV-2 genome is more susceptible to damage and has greater sequencing sample heterogeneity. Variability analysis at the whole-genome sequencing depth showed a higher coefficient of variation for SARS-CoV-2, and DVG analysis indicated a high proportion of splicing sites, suggesting significant genome heterogeneity and implying that most virus particles assembled are enveloped with incomplete RNA sequences. We further analyzed the characteristics of different strains in terms of sequencing depth and DVG content differences and found that as the virus evolves, the proportion of intact genomes in virus particles increases, which can be significantly reflected in third-generation sequencing data, while the proportion of DVG gradually decreases. Specifically, the proportion of intact genome of Omicron was greater than that of Delta and Alpha strains. This can well explain why Omicron strain is more infectious than Delta and Alpha strains. We also speculate that this improvement in completeness is due to the enhancement of virus assembly ability, as the Omicron strain can quickly realize the binding of RNA and capsid protein, thereby shortening the exposure time of exposed virus RNA in the host environment and greatly reducing its degradation level. Finally, by using mathematical modeling, we simulated how DVG effects under different environmental factors affect the infection characteristics and evolution of the population. We can explain well why the severity of symptoms is closely related to the amount of virus invasion and why the same strain causes huge differences in population infection characteristics under different environmental conditions. Our study provides a new approach for future virus research and vaccine development.
Shokrof, M.; Abuelanin, M.; Brown, C. T.; Mansour, T. A.
Show abstract
1Long-read sequencing (LRS) enables variant calling of high-quality structural variants (SVs). Genotypers of SVs utilize these precise call sets to increase the recall and precision of genotyping in short-read sequencing (SRS) samples. With the extensive growth in availabilty of SRS datasets in recent years, we should be able to calculate accurate population allele frequencies of SV. However, reprocessing hundreds of terabytes of raw SRS data to genotype new variants is impractical for population-scale studies, a computational challenge known as the N+1 problem. Solving this computational bottleneck is necessary to analyze new SVs from the growing number of pangenomes in many species, public genomic databases, and pathogenic variant discovery studies. To address the N+1 problem, we propose The Great Genotyper, a population genotyping workflow. Applied to a human dataset, the workflow begins by preprocessing 4.2K short-read samples of a total of 183TB raw data to create an 867GB Counting Colored De Bruijn Graph (CCDG). The Great Genotyper uses this CCDG to genotype a list of phased or unphased variants, leveraging the CCDG population information to increase both precision and recall. The Great Genotyper offers the same accuracy as the state-of-the-art genotypers with the addition of unprecedented performance. It took 100 hours to genotype 4.5M variants in the 4.2K samples using one server with 32 cores and 145GB of memory. A similar task would take months or even years using single-sample genotypers. The Great Genotyper opens the door to new ways to study SVs. We demonstrate its application in finding pathogenic variants by calculating accurate allele frequency for novel SVs. Also, a premade index is used to create a 4K reference panel by genotyping variants from the Human Pangenome Reference Consortium (HPRC). The new reference panel allows for SV imputation from genotyping microarrays. Moreover, we genotype the GWAS catalog and merge its variants with the 4K reference panel. We show 6.2K events of high linkage between the HPRCs SVs and nearby GWAS SNPs, which can help in interpreting the effect of these SVs on gene functions. This analysis uncovers the detailed haplotype structure of the human fibrinogen locus and revives the pathogenic association of a 28 bp insertion in the FGA gene with thromboembolic disorders.
Rodriguez Messan, M.; Yogurtcu, O. N.; McGill, J. R.; Nukala, U.; Sauna, Z. E.; Yang, H.
Show abstract
Cancer vaccines are an important component of the cancer immunotherapy toolkit enhancing immune response to malignant cells by activating CD4+ and CD8+ T cells. Multiple successful clinical applications of cancer vaccines have shown good safety and efficacy. Despite the notable progress, significant challenges remain in obtaining consistent immune responses across heterogeneous patient populations, as well as various cancers. We present as a proof of concept a mechanistic mathematical model describing key interactions of a personalized neoantigen cancer vaccine with an individual patients immune system. Specifically, the model considers the vaccine concentration of tumor-specific antigen peptides and adjuvant, the patients major histocompatibility complexes I and II copy numbers, tumor size, T cells, and antigen presenting cells. We parametrized the model using patient-specific data from a recent clinical study in which individualized cancer vaccines were used to treat six melanoma patients. Model simulations predicted both immune responses, represented by T cell counts, to the vaccine as well as clinical outcome (determined as change of tumor size). These kinds of models have the potential to lay the foundation for generating in silico clinical trial data and aid the development and efficacy assessment of personalized cancer vaccines. Author summaryPersonalized cancer vaccines have gained attention in recent years due to the advances in sequencing techniques that have facilitated the identification of multiple tumor-specific mutations. This type of individualized immunotherapy has the potential to be specific, efficacious, and safe since it induces an immune response to protein targets not found on normal cells. This work focuses on understanding and analyzing important mechanisms involved in the activity of personalized cancer vaccines using a mechanistic mathematical model. This model describes the interactions of a personalized neoantigen peptide cancer vaccine, the human immune system and tumor cells operating at the molecular and cellular level. The molecular level captures the processing and presentation of neoantigens by dendritic cells to the T cells using cell surface proteins. The cellular level describes the differentiation of dendritic cells due to peptides and adjuvant concentrations in the vaccine, activation, and proliferation of T cells in response to treatment, and tumor growth. The model captures immune response behavior to a vaccine associated with patient specific factors (e.g., different initial tumor burdens). Our model serves as a proof of concept displaying its utility in clinical outcomes prediction, lays foundation for developing in silico clinical trials, and aids in the efficacy assessment of personalized vaccines.
Izoulet, M.
Show abstract
COVID-19 (Coronavirus Disease-2019) is an international public health problem with a high rate of severe clinical cases. Several treatments are currently being tested worldwide. This paper focuses on anti-malarial drugs such as chloroquine or hydroxychloroquine. We compare the dynamics of COVID-19 daily deaths in countries using anti-malaria drugs as a treatment from the start of the epidemic versus countries that do not, the day of the 3rd death and the following 10 days. We then use a ARIMA modeling to realize a short-term forecast of deaths dynamics for each group. We show that the first group have a much slower dynamic in daily deaths that the second group. This ecological study is of course only one additional piece of evidence in the debate regarding the efficiency of anti-malaria drugs, and it is also limited as the two groups certainly have other systemic differences in the way they responded to the pandemic, in the way they report death or in their population that better explain differences in dynamics. Nevertheless, the difference in dynamics of daily deaths is so striking that we believe it is useful to present these results as a clue in the researches about the efficiency of hydroxychloroquine. In the end, this data might ultimately be either a piece of evidence in favor or anti-malaria drugs or a stepping stone in understanding further what other ecological aspects place a role in the dynamics of COVID-19 deaths.
Mousavi, R.; Mustafa Ali, M. K.; Lobo, D.
Show abstract
Acute Myeloid Leukemia (AML) is a complex and heterogeneous disease identified by severe clinical progression, fast cellular proliferation, and often high mortality rates. Incorporating diverse longitudinal information on patients medical histories is essential for developing effective disease predictive models applicable to both research and clinical settings. Here, we present a robust methodology for discovering dynamic predictive models to elucidate AML disease progression dynamics from a novel longitudinal multimodal clinical dataset of patients diagnosed with AML. The clinical dataset was analyzed to reveal the main clinical, genetic, and treatment features modulating disease progression. To discover mathematical models--including interactions, parameters, and nodes--predictive of AML progression, we present an explainable machine learning algorithm based on high-performance evolutionary computation. The results demonstrate that the predictive methodology could accurately estimate the clinical dynamics of AML progression in terms of blast percentages for both training and novel patients. This study demonstrates that the developed explainable machine learning approach can successfully predict AML progression by leveraging the heterogeneous and longitudinal dynamics of patients clinical data. More importantly, this methodology shows significant potential for application in modeling the progression dynamics of other acute diseases, providing a flexible and adaptable framework for advancing clinical and translational research.
Xu, Q.; Liu, X.; Jiang, X.; Kim, Y.
Show abstract
MotivationThis study aims to develop an AI-driven framework that leverages large language models (LLMs) to simulate scientific reasoning and peer review to predict efficacious combinatorial therapy when data-driven prediction is infeasible. ResultsOur proposed framework achieved a significantly higher accuracy (0.74) than traditional knowledge-based prediction (0.52). An ablation study highlighted the importance of high quality few-shot examples, external knowledge integration, self-consistency, and review within the framework. The external validation with private experimental data yielded an accuracy of 0.82, further confirming the frameworks ability to generate high-quality hypotheses in biological inference tasks. Our framework offers an automated knowledge-driven hypothesis generation approach when data-driven prediction is not a viable option. Availability and implementationOur source code and data are available at https://github.com/QidiXu96/Coated-LLM
Matov, A.
Show abstract
IntroductionEach piece of cell-free DNA (cfDNA) has a length determined by the exact metabolic conditions in the cell it belonged to at the time of cell death. The changes in cellular regulation leading to a variety of patterns, which are based on the different number of fragments with lengths up to several hundred base pairs (bp) at each of the almost three billion genomic positions, allow for the detection of disease and also the precise identification of the tissue of their origin. MethodsA Kullback-Leibler (KL) divergence computation identifies different fragment lengths and areas of the human genome, depending on the stage, for which disease samples, starting from pre-clinical disease stages, diverge from healthy individual samples. We provide examples of genes related to colorectal cancer (CRC), which our algorithm detected to belong to divergent genomic bins. The staging of CRC can be viewed as a Markov chain and that provides a framework for studying disease progression and the types of epigenetic changes occurring longitudinally at each stage, which might aid the correct classification of a new hospital sample. ResultsIn a new look to treat such data as grayscale value images, pattern recognition using artificial intelligence could be one approach to classification. In CRC, Stage I disease does not, for the most part, shed any tumor in circulation, making detection difficult for established machine learning (ML) methods. This leads to the deduction that early detection, where we can only rely on changes in the metabolic patterns, can be accomplished when the information is considered in its entirety, for example by applying computer vision methods. ConclusionsLongitudinal analysis of patients genetic datasets can detect the early stages of neoplasm better than population-based methods.
Zhang, X. T.; Han, R. H.
Show abstract
A massive number of transcriptomic profiles of blood samples from COVID-19 patients has been produced since pandemic COVID-19 begins, however, these big data from primary studies have not been well integrated by machine learning approaches. Taking advantage of modern machine learning arthrograms, we integrated and collected single cell RNA-seq (scRNA-seq) data from three independent studies, identified genes potentially available for interpretation of severity, and developed a high-performance deep learning-based deconvolution model AImmune that can predict the proportion of seven different immune cells from the bulk RNA-seq results of human peripheral mononuclear cells. This novel approach which can be used for clinical blood testing of COVID-19 on the ground that previous research shows that mRNA alternations in blood-derived PBMCs may serve as a severity indicator. Assessed on real-world data sets, the AImmune model outperformed the most recognized immune profiling model CIBERSORTx. The presented study showed the results obtained by the true scRNA-seq route can be consistently reproduced through the new approach AImmune, indicating a potential replacing the costly scRNA-seq technique for the analysis of circulating blood cells for both clinical and research purposes.
Pugalenthi, P. V.; Xie, L.; He, B.; Nho, K.; Saykin, A. J.; Yan, J.
Show abstract
Alzheimers disease (AD) is a highly heritable brain dementia, along with substantial failure of cognitive function. Large-scale genome-wide association studies (GWAS) have led to a significant set of SNPs associated with AD and related traits. GWAS hits usually emerge as clusters where a lead SNP with the highest significance is surrounded by other less significant neighboring SNPs. Although functionality is not guaranteed with even the strongest associations in the GWAS, the lead SNPs have been historically the focus of the field, with the remaining associations inferred as redundant. Recent deep genome annotation tools enable the prediction of function from a segment of DNA sequence with significantly improved precision, which allows in-silico mutagenesis to interrogate the functional effect of SNP alleles. In this project, we explored the impact of top AD GWAS hits on the chromatin functions, and whether it will be altered by the genomic context (i.e., alleles of neighborhood SNPs). Our results showed that highly correlated SNPs in the same LD block could have distinct impact on the downstream functions. Although some GWAS lead SNPs showed dominating functional effect regardless of the neighborhood SNP alleles, several other ones do get enhanced loss or gain of function under certain genomic context, suggesting potential extra information hidden in the LD blocks.
Mondillo, G.; Perrotta, A.; Colosimo, S.; Frattolillo, V.
Show abstract
The advanced Large Language Model ChatGPT4o, developed by OpenAI, can be used in the field of bioinformatics to analyze and understand cross-reactive allergic reactions. This study explores the use of ChatGPT4o to support research on allergens, particularly in the cross-reactivity syndrome between cat and pork. Using a hypothetical clinical case of a child with a confirmed allergy to Fel d 2 (cat albumin) and Sus s 1 (pork albumin), the model guided data collection, protein sequence analysis, and three-dimensional structure visualization. Through the use of bioinformatics tools like SDAP 2.0 and BepiPRED, the epitope regions of the allergenic proteins were predicted, confirming their accessibility to immunoglobulin E (IgE) and probability of cross-reactivity. The results show that regions with high epitope probability exhibit high surface accessibility and predominantly coil and helical structures. The construction of a phylogenetic tree further supported the evolutionary relationships among the studied allergens. ChatGPT4o has demonstrated its usefulness in guiding non-specialist researchers through complex bioinformatics processes, making advanced science accessible and improving analytical and innovation capabilities.
Kwon, E.-J.; Hwang, H. S.; Chang, E.; An, J.-Y.; Cha, H.-J.
Show abstract
Conventional chemotherapeutics exploit cancers hallmark of active cell cycling, primarily targeting mitotic cells. Consequently, the mitotic index (MI), representing the proportion of cells in mitosis, serves as both a prognostic biomarker for cancer progression and a predictive marker for chemo-responsiveness. In this study, we developed a transcriptome signature to predict the chemotherapeutic responsiveness based on the Active Mitosis Signature Enrichment Score (AMSES), a computational metric previously established to estimate the active mitosis using multi-omics data from The Cancer Genome Atlas (TCGA) lung cancer cohorts, lung adenocarcinoma (LUAD) and lung squamous cell carcinoma (LUSC) patients. Leveraging advanced machine learning techniques, we enhanced the predictive power of AMSES and developed AMSES for chemo-responsiveness, termed A4CR. Comparative analysis revealed a strong correlation between A4CR and the MI of 69 cases from separated non-small cell lung cancer (NSCLC) cohort. The utility of A4CR as a therapeutic biomarker was validated through in silico analysis of public datasets, encompassing transcriptomic profiles of cancer cell lines (CCLs) and their corresponding multiple drug response data as well as clinicogenomic data from TCGA. These findings highlight the potential of integrating gene signatures with machine learning and large-scale datasets to advance precision oncology and improve therapeutic decision-making for cancer patients.
Subedi, S.; Park, Y. P.
Show abstract
Gene expression variation in cancer cells is attributed to many inherited and environmental factors, including genetic variants and cellular landscapes. Decomposing different sources of information is intractable with single-cell RNA-seq alone. However, we show that our new approach can split them with the help of multiple patients, assuming that cell types are widely shared and genetic effects are specifically present in a particular patient. Our approach based on a cross-attention neural network was applied to three different cancer types to identify cell types and patient-specific genetic effects in transcriptomic data. Residual expressions, excluding cell types, can implicate patient-specific disease mechanisms.